[SVLS-8979] Add CloudFormation template for Lambda Durable Function event forwarder#1149
Open
lym953 wants to merge 9 commits into
Open
[SVLS-8979] Add CloudFormation template for Lambda Durable Function event forwarder#1149lym953 wants to merge 9 commits into
lym953 wants to merge 9 commits into
Conversation
Captures AWS Lambda Durable Function execution status change events from EventBridge and delivers them to the Datadog HTTP intake via Amazon Data Firehose. Records arrive at Datadog as the raw EventBridge envelope; reshaping (ARN qualifier stripping, detail.* flattening, ISO timestamp parsing) is configured on the Datadog side via a logs processing pipeline rather than inside the stack. Resources created (9): S3 backup bucket + policy, Firehose delivery stream + role + log group + 2 log streams, EventBridge rule + role. When DdApiKeyKmsCiphertext is set, four additional resources are provisioned to decrypt the API key at deploy time via a custom resource (IAM role, Lambda, log group, Custom::DatadogApiKeyKmsDecrypt). Four mutually-acceptable API-key options: - DdApiKey (plaintext, NoEcho) - DdApiKeySecretArn (Secrets Manager dynamic reference) - DdApiKeySsmParameterName (SSM SecureString dynamic reference) - DdApiKeyKmsCiphertext + DdApiKeyKmsKeyArn (deploy-time decrypter) Five independent function-name filter slots (FunctionNameFilter1..5), each producing two matchers — unqualified ARN + version/alias-qualified ARN — so events for any qualifier form are captured. Empty slots are stripped from the EventBridge rule at deploy time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Publishes template.yaml to the public datadog-cloudformation-template bucket at aws/lambda-durable-function-event-forwarder/<version>.yaml (+ latest.yaml), authenticating to the Datadog Prod account (464622532012) via the prod-engineering role. Requires a semver version arg, validates the template, and refuses to overwrite an already-published version. README publishing section updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- FunctionNameFilter{1..5} -> FunctionArnFilter{1..5}: users now supply an
unqualified function ARN (or wildcard over one) instead of a bare name.
We append ":*" since detail.functionArn is always version/alias-qualified,
and an AllowedPattern rejects a pasted qualified ARN at deploy time.
- Statuses now defaults to "" (forward all). The status key is dropped from
the EventPattern when empty, and the whole detail block is omitted when
neither a status nor a function filter is set (empty detail:{} is invalid).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the DdApiKeyKmsCiphertext / DdApiKeyKmsKeyArn key path and its four deploy-time decrypter resources (Lambda + role + log group + custom resource). It was carried over from the Lambda forwarder's DD_KMS_API_KEY pattern, but that forwarder already has a runtime Lambda; here it meant a whole custom-resource decrypter for the least-secure of the options. The two dynamic-reference paths (Secrets Manager, SSM SecureString) cover the "keep plaintext out of the template" need more securely at zero resource cost. API key is now one of three. Also drops the now-unneeded W1030 cfn-lint suppression. - Trim verbose parameter descriptions; list valid Statuses values (RUNNING/SUCCEEDED/FAILED/TIMED_OUT/STOPPED); replace em-dashes with ASCII so they render correctly in the CloudFormation console. - release.sh: validate with local cfn-lint instead of cloudformation:ValidateTemplate (the publishing role is scoped to S3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move implementation detail out of customer-facing parameter Descriptions: drop the Firehose URL derivation from DdSite, the service-taxonomy guidance from DdService, and the status-matching/EventBridge-rule notes from Statuses. Preserve the one fact not otherwise self-documenting (the API key becomes the X-Amz-Firehose-Access-Key header; dynamic references keep plaintext out of the template) as a comment at the AccessKey field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Append "(Optional)" to the Event filters parameter group and "(optional)" to the Statuses label, matching the Lambda forwarder's labeling and the sibling FunctionArnFilter labels, so the console makes clear nothing in the section is required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Publishing is handled in a separate PR. Remove the release.sh usage steps and release checklist from the README and the release.sh row from the Files table, keeping the published-URL pattern and quick-create link. (The release.sh file itself was removed in the previous commit.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DdService, DdEnv, DdVersion, and DdTags were declared but never wired to anything (the Firehose intake can't carry them as proper facets, so the stack transmitted nothing). They misled users into thinking they set tags. Remove them and the dead HasEnv/HasVersion/HasTags conditions; DdSite stays (it builds the Firehose URL). Service/env/version/tags are set in the Datadog log processing pipeline instead. cfn-lint is now warning-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Cut the Publishing/Deploying/nested-stack/Filtering/Files/Notes sections; the Datadog-side pipeline section now just says to install the AWS Lambda integration (its OOTB logs pipeline is provisioned automatically). - Correct the example payload to the field names from AWS's "Monitoring durable functions" doc (durableExecutionArn, durableExecutionName, functionArn, status, startTimestamp; endTimestamp for terminal states) and link the doc. The previous executionName/executionStartTime/executionEndTime fields were inaccurate. - Remove the speculative auto-tag enumeration / integration attribution; state only that tagging and reshaping happen on the Datadog side. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an installable, self-contained CloudFormation stack that captures AWS Lambda Durable Function execution status-change events from EventBridge and forwards them to Datadog via a Kinesis Data Firehose HTTP endpoint, with an S3 bucket as a failed-records backup.
Changes:
- Introduces a new CloudFormation template that provisions EventBridge rule, Firehose delivery stream, IAM roles, and an S3 backup bucket.
- Documents architecture, parameters, outputs, and the forwarded EventBridge event shape in a new README.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| aws/durable_function_event_forwarder/template.yaml | Provisions EventBridge → Firehose → Datadog intake pipeline plus S3 failed-records backup. |
| aws/durable_function_event_forwarder/README.md | Explains deployment parameters, outputs, and the raw EventBridge envelope forwarded to Datadog. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+273
to
+277
| - Effect: Allow | ||
| Action: | ||
| - logs:PutLogEvents | ||
| Resource: | ||
| - !GetAtt FirehoseLogGroup.Arn |
| Version: "0.1.0" | ||
|
|
||
| Parameters: | ||
| # ---- Datadog API key (exactly one of the three is required) ---- |
Comment on lines
+292
to
+296
| # The API key becomes the X-Amz-Firehose-Access-Key header on each | ||
| # request and is stored opaquely by Firehose. The two dynamic- | ||
| # reference paths resolve the value straight into this resource at | ||
| # deploy time, so the plaintext never appears in the template source, | ||
| # the stack parameters, or stack events. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
To capture TIMED_OUT and STOPPED status of Lambda durable function executions, we need to capture the status change events in EventBridge and forward them to Datadog. This involves three changes:
Architecture
This is Option 4.3 in the design doc. See the doc for why we need to capture the status change events.
Changes
README.mdParams of the CloudFormation template
datadoghq.comRUNNING,SUCCEEDED,FAILED,TIMED_OUT,STOPPED.arn:aws:lambda:us-east-2:425362996713:function:my-durable-function, or a wildcard pattern, e.g.arn:aws:lambda:us-east-2:425362996713:function:my-durable-*.Next steps
Test plan
Steps
arn:aws:lambda:us-east-2:425362996713:function:yiming-durable-py-custom-traceryiming-durable-py-custom-tracerResult
After a few minutes, the logs appeared in Datadog.
